Name		Name	Last commit message	Last commit date
parent directory ..
configs		configs
dataset		dataset
meta_data		meta_data
models		models
work_dirs/intern_vit_6b_1k_224		work_dirs/intern_vit_6b_1k_224
README.md		README.md
config.py		config.py
ddp_hooks.py		ddp_hooks.py
ema_deepspeed.py		ema_deepspeed.py
export.py		export.py
gflops.py		gflops.py
logger.py		logger.py
lr_scheduler.py		lr_scheduler.py
main.py		main.py
main_accelerate.py		main_accelerate.py
main_deepspeed.py		main_deepspeed.py
optimizer.py		optimizer.py
train_in1k.sh		train_in1k.sh
train_in1k_deepspeed.sh		train_in1k_deepspeed.sh
utils.py		utils.py

README.md

InternViT-6B for Image Classification

This folder contains the implementation of the InternViT-6B for image classification.

The codebase for this part is derived from InternImage, with some code references to EVA and DINOv2. Thanks for their great work.

InternViT-6B follows the structure of vanilla ViT, and its hyperparameters are listed in the table below.

🛠️ Installation

See INSTALLATION.md

📦 Data Preparation

Please prepare the dataset according to your needs.

ImageNet-1K: We use the standard ImageNet dataset, you can download it from http://image-net.org/.
ImageNet-A: Download it from https://people.eecs.berkeley.edu/~hendrycks/imagenet-a.tar.
ImageNet-R: Download it from https://people.eecs.berkeley.edu/~hendrycks/imagenet-r.tar.
ImageNetV2: Download it from https://imagenetv2public.s3-us-west-2.amazonaws.com/imagenetv2-matched-frequency.tar.gz.

ImageNet-Sketch: Download it using gdown.

# GDown is needed to download the dataset. Please install it via `pip install gdown`
gdown --id 1Mj0i5HBthqH1p_yeXzsg22gZduvgoNeA

First, please prepare the ImageNet-1K, ImageNet-A, ImageNet-R, ImageNetV2, and ImageNet-Sketch datasets following the directory structure outlined below.

$ tree data
data
├── imagenet-1k
│         ├── train
          │    ├── n01498041
          │    └── ...
│         └── val
│              ├── ILSVRC2012_val_00000001.JPEG
│              └── ...
├── imagenet-a
│         ├── n01498041
│         └── ...
├── imagenet-r
│         ├── n01443537
│         └── ...
├── imagenet-sketch
│         ├── n01440764
│         └── ...
└── imagenetv2
    └── ImageNetV2-matched-frequency

Then, unzip the train.txt.zip and val.txt.zip in meta_data/.

cd meta_data/
unzip train.txt.zip
unzip val.txt.zip

📦 Model Preparation

model name	type	download	size
InternViT-6B-224px	pytorch	🤗 HF link	12 GB
InternViT-6B-224px-head	pytorch	🤗 HF link	25.7 MB

Please download the above model weights and place them in the pretrained/ folder.

cd pretrained/
wget https://huggingface.co/OpenGVLab/InternVL/resolve/main/intern_vit_6b_224px.pth
wget https://huggingface.co/OpenGVLab/InternVL/resolve/main/intern_vit_6b_224px_head.pth

The directory structure is:

pretrained
├── intern_vit_6b_224px_head.pth
└── intern_vit_6b_224px.pth

🔍 Linear Probing on ImageNet-1K

Note, please install apex before training (see installation guide above for details).

To train a linear classifier for InternViT-6B on ImageNet with 8 GPUs, run:

python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --cfg configs/intern_vit_6b_1k_224.yaml
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224.yaml --launcher slurm

📊 Evaluation

model name	IN-1K	IN-ReaL	IN-V2	IN-A	IN-R	IN-Sketch	download
intern_vit_6b_1k_224.yaml	88.2	90.4	79.9	77.5	89.8	69.1	ckpt \| log

Evaluate InternViT-6B on ImageNet-1K val with 8 GPUs (click to expand).

python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --eval \
    --cfg configs/intern_vit_6b_1k_224.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224.yaml --eval \
    --resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm

Expected results:

 * Acc@1 88.230 Acc@5 98.474
Accuracy of the network on the 50000 test images: 88.2%

Evaluate InternViT-6B on ImageNet-ReaL with 1 GPU (click to expand).

Note: ImageNet-ReaL now only supports single-GPU testing.

python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --eval \
    --cfg configs/intern_vit_6b_1k_224_test_imagenet_real.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=1 GPUS_PER_NODE=1 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224_test_imagenet_real.yaml --eval \
    --resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm

Expected results:

* ReaL Acc@1 90.437 Acc@5 98.567 loss 0.605
ReaL Accuracy of the network on the 50000 test images: 90.4%

Evaluate InternViT-6B on ImageNetV2 with 8 GPUs (click to expand).

python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --eval \
    --cfg configs/intern_vit_6b_1k_224_test_imagenetv2.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224_test_imagenetv2.yaml --eval \
    --resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm

Expected results:

 * Acc@1 79.940 Acc@5 95.340
Accuracy of the network on the 10000 test images: 79.9%

Evaluate InternViT-6B on ImageNet-A with 8 GPUs (click to expand).

python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --eval \
    --cfg configs/intern_vit_6b_1k_224_test_imagenet_a.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224_test_imagenet_a.yaml --eval \
    --resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm

Expected results:

 * Acc@1 77.479 Acc@5 92.737
Accuracy of the network on the 7500 test images: 77.5%

Evaluate InternViT-6B on ImageNet-R with 8 GPUs (click to expand).

python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --eval \
    --cfg configs/intern_vit_6b_1k_224_test_imagenet_r.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224_test_imagenet_r.yaml --eval \
    --resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm

Expected results:

 * Acc@1 89.777 Acc@5 97.023
Accuracy of the network on the 30000 test images: 89.8%

Evaluate InternViT-6B on ImageNet-Sketch with 8 GPUs (click to expand).

python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --eval \
    --cfg configs/intern_vit_6b_1k_224_test_imagenet_sketch.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224_test_imagenet_sketch.yaml --eval \
    --resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm

Expected results:

 * Acc@1 69.117 Acc@5 88.341
Accuracy of the network on the 50889 test images: 69.1%

Files

classification

Directory actions

More options

Directory actions

More options

Latest commit

History

classification

Folders and files

parent directory

InternViT-6B for Image Classification

🛠️ Installation

📦 Data Preparation

📦 Model Preparation

🔍 Linear Probing on ImageNet-1K

📊 Evaluation